Data Loading


First I can iterate over all pred_df.csv files in the Output folder and extract the observed effect, average effect, and WT effect which can be used for the transformations, producing the merged file.

observed_effect average_effect wt_effect unique_id
0.000000 -1.585059 0.000000 AP_catef_1
-2.924453 -2.827816 -2.924453 AP_catef_1
-4.943095 -5.610975 -4.943095 AP_catef_1
-5.623423 -6.853733 -7.867548 AP_catef_1
-2.352617 -1.596688 -2.352617 AP_catef_1
-3.156145 -2.839445 -5.277070 AP_catef_1

Monotonic spline non-linear transform


Assessing degrees for splines

I fit cubic splines with penalty towards monotonicity by varying degrees ranging from 1-5. Then, I used AIC to select the strongest model for each landscape, and log the number of degrees for the upcoming transform.

Unique_ID Best_Degree Best_AIC
AP_catef_1 4 79.375299
DHFR_ic50_c57 1 22.838106
DHFR_ic50_c58 1 29.992922
DHFR_ic50_c59 4 18.441344
DHFR_ic50_c60 5 24.490984
DHFR_ic50_c61 5 9.666351
DHFR_ic75_palmer 5 197.681499
DHFR_kcat_trajg 4 6.681135
DHFR_kcat_trajr 5 -3.673020
DHFR_ki_trajg 5 18.610525
DHFR_ki_trajr 4 6.720748
MPH_catact_CaPTM 1 35.963443
MPH_catact_CdPTM 1 -11.111711
MPH_catact_CoPTM 1 24.892223
MPH_catact_CuPTM 1 23.676399
MPH_catact_MgPTM 1 26.680008
MPH_catact_MnPTM 1 11.392128
MPH_catact_NiPTM 1 4.306652
MPH_catact_ZnPTM 4 46.094248
NfsA_ec50_2039 5 -101.318022
NfsA_ec50_3637 5 -65.972370
OXA-48_ic50_CAZtraj1 5 -32.759419
OXA-48_ic50_CAZtraj2 4 -26.712493
OXA-48_ic50_CAZtraj3 1 -99.376569
OXA-48_ic50_PIPtraj1 5 6.216836
OXA-48_ic50_PIPtraj2 5 28.085051
OXA-48_ic50_PIPtraj3 4 -20.227528
PTE_catact_2NH 5 54.866422
PTE_catact_butyrate 5 -15.140782
TEM_MIC_weinreich 5 39.737284
TEM_growth_AM 4 13.245353
TEM_growth_AMC 4 7.064258
TEM_growth_AMP 5 42.425104
TEM_growth_CAZ 1 40.127210
TEM_growth_CEC 1 39.052728
TEM_growth_CPD 5 32.778424
TEM_growth_CPR 1 30.292993
TEM_growth_CRO 5 40.952662
TEM_growth_CTT 1 45.810669
TEM_growth_CTX 5 36.284359
TEM_growth_CXM 4 23.141110
TEM_growth_FEP 1 26.596062
TEM_growth_SAM 5 22.150169
TEM_growth_TZP 5 21.309300
TEM_growth_ZOX 5 23.862908

Visual spline model evaluation

I am then able to plot what the transformation function looks like (in blue) for each landscape with the x-axix representing additive effects, and y axis representing observed effects.

Visualization of post-transformed data

We can also inspect the transformed data by plotting scatterplots against the previous observed effects in order to gauge how the data will be transformed for each landscape.

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-10.1510461496875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.3763691616875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.669048522875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.6773956788125, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.0561651719375001, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(0.1505184723125, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.175174974098038, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-2.0188274643125, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-1.6236400111875, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.692822294062501, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.564818362125001, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.473301722907051, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(0.142740045279341, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.714866858571236, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.256493158789162, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.525714893344278, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.65621332985174, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.456627754334221, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.687987888097287, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.511775567217363, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.33614802335173, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.28127227342755, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.282094961196305, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.160628161189058, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-2.25266045395833, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-2.31931248227604, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-1.99818581863791, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.188262458006133, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.591142196043501, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.751685262437498, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.616625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.728625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-1.11625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.77925, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-1.9028125, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.428875, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-1.5428125, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.637625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.204875, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.3609375, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 4L, knots = numeric(0), Boundary.knots =
## c(-0.6940625, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 1L, knots = numeric(0), Boundary.knots =
## c(-0.32075, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-1.473875, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(average_effect, degree = 5L, knots = numeric(0), Boundary.knots =
## c(-0.3038125, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases

Evaluation of four parameter transformations


The drawback of using monotonic splies (or even power transforms), as can be seen from the plots above, is the lack of bounding. Bounding is crucial to avoid greatly transforming phenotype values that lay outside of the bounds of the transform, which itself is constrained by the range of the predicted first-order effects.

In other words, phenotypes with magnitudes that lay outside of the range of predicted values by the first-order model are at risk of being incorrectly transformed. Though this is somewhat true for a four-parameter model, the bounded upper- and lower- thresholds ensure that phenotypes that lay outside of the predicted range will be approximated more accurately, i.e., the transformation will be lower in magnitude than other models.

Four parameter function transform

Here we attempt to transform all datasets using the four-paramter function, with fitting performed by non-linear least squares regression using nlsLM. The fits are compared to simple linear models using an AIC to determine whether the additional information stemming from the four-parameter transform is parsimonous, otherwise, no transform is applied.

## Error in nlsModel(formula, mf, start, wts) : 
##   singular gradient matrix at initial parameter estimates

## Error in nlsModel(formula, mf, start, wts) : 
##   singular gradient matrix at initial parameter estimates

Indeed, these fits appear much better visually, and we also avoid overfitting to datasets by simply utilizing the linear model. We can then use this analysis to transform and export these new data.

Performing transformations and exporting data

Upon performing transformations, it is also useful to see how the R2 value improves after transformations. Provided is the table of the pre- and post- transformation R2 values for a linear regression.

Data export


Here we need to re-import the data with the genotype column, and also get a list of columns that start with “p”, i.e., for AP we need to get p101, p166, p153, p322, and p328. We can scrub the p’s and save this as a vector.

The output csv files should have the following format (an example for AP):

WT
WT-seq
Positions
p101 p166 p153 p322 p328
Genotype Function
DRDEK 0

and so on…

## Error in nlsModel(formula, mf, start, wts) : 
##   singular gradient matrix at initial parameter estimates
## Error in nlsModel(formula, mf, start, wts) : 
##   singular gradient matrix at initial parameter estimates

After the export is done, the data can be re-analyzed using the other scripts in this directory.